Shareware Grab Bag

home *** CD-ROM | disk | FTP | other *** search

/ Shareware Grab Bag / Shareware Grab Bag.iso / 007 / sccs.arc / HDIFF.DOC < prev next >

Wrap

Text File | 1985-12-10 | 14.5 KB | 352 lines

19 February, 1984 hdiff 1.14 Purpose ------- hdiff is a utility which can compare two standard DOS text files and isolate the differences between them. It can produce two distinct types of reports on the differences. First, hdiff can prepare a simple report of lines which appear in the second file but not in the first (insertions), and of lines which appear in the first file but not in the second (deletions). Second, hdiff can produce a special "report" which is, in fact, an Edlin script. This script, when applied to the first file, will produce a clone of the second file. This second function of hdiff is similar to the Unix utility "diff". hdiff uses a file comparison algorithm which was developed by Paul Heckel and described by D.E.Cortesi in Dr. Dobb's Journal #94 (August, 1984). The algorithm is substantially more efficient than traditional file comparison methods; you will find that it can generate a difference report between two files in little more than the time it takes to read the two files. This version of hdiff was derived from Cortesi's demonstration program, with substantial modifications which -- accomodate differences between Edlin and CP/M's Ed (for which the demo was written) -- allow use of Edlin's block move capabilities -- allow for much larger files through the use of all available memory. -- allow the use of command line parameters and switches, including case and spacing insensitivity. -- allow the user to specify at run time the maximum number of lines which will be processed. This allows hdiff to use memory more efficiently. -- allow the user to request the simpler difference report rather than the Edlin script. System requirements ------------------- hdiff requires: -- IBM PC, PC/XT, PC/AT, or other MSDOS machine -- MSDOS 2.00 or later -- At least 128K of RAM. The more RAM you have, the larger the files you can process. Running hdiff ------------- The general syntax for hdiff is: hdiff [-ecs] [-nnnn] oldfile.ext newfile.ext The optional -e switch instructs hdiff to produce an Edlin script file rather than the difference report. The optional -c switch instructs hdiff to ignore differences in case: "HDIFF" is the same as "hdiff". The optional -s switch instructs hdiff to ignore differences in spacing; all spaces and tabs are ignored for comparison purposes. The optional -nnnn switch assists in memory management; it represents the maximum number of lines hdiff will be required to process, i.e., the number of lines in the larger of the two files. The default for this value is 2000 lines; there is an absolute maximum of 5000 lines. See the section on memory management for more information about this switch. The switches may be combined, and they may be in any order: '-e -c -1000', '-1000ce', and '-ce1000' are all equivalent. Examples: hdiff foo.c newfoo.c compares file 'foo.c' with file 'newfoo.c' and produces a simple report showing insertions (lines in newfoo which do not appear in foo) and deletions (lines in foo which do not appear in newfoo). Lines which have been moved but are otherwise unchanged do not appear in this report. hdiff -ec foo.c newfoo.c compares foo.c with newfoo.c, ignoring case differences, and prepares an Edlin script. This script, if applied to foo, will create a copy of newfoo. The script file is sent to the console, so a more useful command is hdiff -e foo.c newfoo.c > foo.dat which uses standard DOS redirection to send the edlin script to the disk file foo.dat. Note that the program logo and error messages are unaffected by redirection and will always be sent to the screen. hdiff -e4000 foo.c newfoo.c > foo.dat is equivalent to the previous command, except that it informs hdiff that one of the files might contain up to 4000 lines. Report formats -------------- The difference report consists of lines in the format: nnnn+ text or nnnn- text The '+' format indicates that the line is new (an insertion); the '-' format indicates that the line is gone (a deletion). Thus: 0001- This line appears in the old file only 0001+ This line appears in the new file only The 'nnnn' represents the line number. For '+' lines, it's the line number in the new file; for '-' lines, it's the line number in the old file. The Edlin script is a series of Edlin commands. See Edlin documentation for their meanings; the only commands which will appear are I (insert), D (delete), M (move), and E (End). The script may look a little strange if you look at it (with an editor or via the TYPE command). After the completion of each insertion sequence, there will be a heart symbol; this is the screen representation of Ctrl-C, which is used to terminate an Edlin insertion. Uses ---- The simplest use for hdiff is to compare two files to see if they are the same. This can be used to check for corruption during backups, copies, etc., or to determine which of two files is newer. Even this simple use of hdiff can be useful in unexpected ways, however. For example, look at this small batch file: dir a: > temp find "-" temp > dir.a dir b: > temp find "-" temp > dir.b hdiff dir.b dir.a > temp.bat erase dir.a erase dir.b erase temp This batch can be used for a simple backup system. Assume that the default directory in drive A contains a series of files that you want to backup, and that the default directory in drive B contains the same set of files from the last backup. The batch will isolate differences between the two directories and prepare a file called temp.bat which contains a list of those files which have been changed or added since the last backup. (The .bat extension is used because many popular text editors could very easily convert the temp.bat file to a series of copy commands which could be used, in batch mode, to perform the copying.) The "Edlin" mode has potentially much more significant use. Perhaps its greatest potential lies in what are known as "source code control systems". These systems, quite common in mainframe and minicomputer systems, allow programmers to maintain many generations of program source text quite economically; rather than storing each modified file in its entirety, only the original is saved, along with a series of difference files. Hdiff provides a first step in this direction for MSDOS machines (see the "Plans" section below). Typical use of the current hdiff would be something like this. Assume that hdiff.scc contains an "original" version of hdiff; the current version (1.10) is hdiff.c. First, the command hdiff -e hdiff.scc hdiff.c > hdiff.110 will create an edlin script which would convert hdiff.scc into version 1.10 of hdiff.c. Typically, the actual hdiff.c file would them be discarded (WARNING: see below. This program is experimental!) As newer versions are developed, the same procedure is used to create hdiff.111, hdiff.120, etc. Note that these difference files would, in all likelihood, be much smaller that the total size of all of the versions. In order to "retrieve" an earlier version, say 1.00, the command copy hdiff.scc hdiff.c edlin hdiff.c < hdiff.100 would convert hdiff.scc into version 1.00 of hdiff. True source code control systems are considerably more efficient than this "by hand" method, are much easier to use, and provide significant features beyond mere storage of multiple versions. For whatever it's worth, note that hdiff -e file1 file2 | edlin file1 is roughly equivalent to copy file2 file1 except that the original file1 is saved in file1.bak. cdelta and cget --------------- The two demonstration batches, cdelta and cget, provide a quick sample of the kinds of things that can be done with hdiff and edlin. The two batches are designed for C programs; to revise them for other languages, simply replace all references to ".c" with the desired extension (.asm, for example). The purpose of cdelta is to generate a change script which will convert a "base" source file into a specified version of your source. Cget performs the inverse task; it applies a specified change file to the base and produces a file containing the specified version. File naming conventions are as follows: file.scc: "base" source; scc = source code control file.###: A change script to produce version ### file.c: The current version (cdelta), or the output file (cget) For example, suppose you are working with a C program called foo. A base (earliest) version of this file should be in foo.scc. You have just finished revision 1.10 of foo. To create the change file, type cdelta foo 110 The batch will create a new file, foo.110; this file is an Edlin script which will convert foo.scc into version 1.10 of foo.c. To retrieve a specified version, say 1.05, use cget foo 105 The batch will apply the script foo.105 to foo.scc and produce foo.c, which will contain the source for version 1.05. Note that cget always creates a file called file.c, overwriting any existing file by that name. This implies that you do NOT keep your current source in file.c; you keep the current source only by retaining file.scc and the delta files. Memory management ----------------- Hdiff uses all available memory. The purpose of the -nnnn (max number of lines) switch is to allow it to use memory more efficiently, and to allow you to more effectively use hdiff in very small or very large machines. This is how it works. For each *potential* line, hdiff requires approximately 34 bytes of storage for various tables. The default configuration (space for 2000 lines) will thus require about 68K bytes of data space for the tables. The remainder of available memory (less the size of the program itself and a much smaller amount of overhead data) is used to store the text read from the files. Text storage space is required for each *unique* line in either file. If you have a small machine (i.e., less RAM), that much table space will leave very little room for text storage; it may even be more space than is available, and the program will not run at all. If you find this to be the case, try reducing the number of lines via the switch (-1000, or -500, for example.) Conversely, if you have a very large machine, you will have plenty of space available to process file larger than 2000 lines. If that is the case, increase the maxlines switch as necessary (but remember that in no case can maxlines exceed 5000). When hdiff is finished, it displays a message like: Storage use: 19% This message tells you approximately what percentage of the total available memory was actually used. Restrictions ------------ The following act, in one way or another, as restrictions on hdiff: -- File format. Hdiff is intended as a DOS text file comparator only. It is NOT a replacement for the DOS utility 'comp'. Don't use it on binary (program or data) files, or on word processor files if they contain embedded control codes. -- Available memory (as discussed above) -- Actual size of the files. Edlin will read a file only until 75% of its available memory is filled. Since Edlin uses only a maximum of 64K, this means that it will read only 48K of text. Hdiff cannot account for this problem, so the absolute maximum file size it can handle is approximately 48K. -- Line size. Limited to a maximum of 255 characters/line. A Warning and A Plan -------------------- Hdiff is experimental! It has been in use for about six months (as of 19 Feb 1985) with no known errors, but this is is NOT to say that you should entrust your only copy of a source file to hdiff! Please bear this in mind as you use it. Please report any problems to me. I intend, at some "unspecified future time", to incorporate hdiff or a version of it in a larger source code control system. This system would allow you to maintain multiple generations of program source files very efficiently (in terms of storage requirements). Some knotty problems relating to performance on a standard-issue PC remain to be solved. Comments or suggestions relating to this system are welcome. Tell me what you would like to see. In the meantime, a temporary "system" is avilable in the file "sccs.lbr", which contains simple versions of get and delta (written in C for performance reasons). --------------------- hdiff and this document are Copyright (c) 1984, 1985 by: Christopher J. Dunford 10057-2 Windstream Drive Columbia, Maryland 21044 CompuServe 76703,2002 Source STR211 You may copy and use hdiff for your personal use only. You may copy hdiff for others, but you may not charge them for it. You may not use hdiff for any commercial purpose whatsoever. Address comments to the author at the above address, at CompuServe (preferably) or at the Source (occasionally). Hdiff is written in C and compiled using the Computer Innovations C86 compiler (Version 2.13), large model.